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Abstract. This paper considers conducting inference about the effect 
of a treatment (or exposure) on an outcome of interest. In the ideal set¬ 
ting where treatment is assigned randomly, under certain assumptions 
the treatment effect is identifiable from the observable data and infer¬ 
ence is straightforward. However, in other settings such as observational 
studies or randomized trials with noncompliance, the treatment effect 
is no longer identihable without relying on untestable assumptions. 
Nonetheless, the observable data often do provide some information 
about the effect of treatment, that is, the parameter of interest is par¬ 
tially identifiable. Two approaches are often employed in this setting: 
(i) bounds are derived for the treatment effect under minimal assump¬ 
tions, or (ii) additional untestable assumptions are invoked that render 
the treatment effect identifiable and then sensitivity analysis is con¬ 
ducted to assess how inference about the treatment effect changes as 
the untestable assumptions are varied. Approaches (i) and (ii) are con¬ 
sidered in various settings, including assessing principal strata effects, 
direct and indirect effects and effects of time-varying exposures. Meth¬ 
ods for drawing formal inference about partially identified parameters 
are also discussed. 
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1. INTRODUCTION 

In many areas of science, interest often lies in as¬ 
sessing the causal effect of a treatment (or exposure) 
on some particular outcome of interest. For exam¬ 
ple, researchers may be interested in estimating the 
difference between the average outcomes when all 
individuals are treated (exposed) versus when all in¬ 
dividuals are not treated (unexposed). When treat¬ 
ment is assigned randomly and there is perfect com¬ 
pliance to treatment assignment, such treatment ef¬ 
fects are identihable and inference about the effect of 
treatment proceeds in a straightforward fashion. On 
the other hand, if the treatment assignment mech¬ 
anism is not known to the analyst or compliance 
is not perfect, then these treatment effects are not 
identihable from the observable data. 

A statistical parameter is considered identihable 
if different values of the parameter give rise to differ¬ 
ent probability distributions of the observable ran- 


1 




2 


RICHARDSON, HUDGENS, GILBERT AND FINE 


dom variables. A parameter is partially identifiable 
if more than one value of the parameter gives rise 
to the same observed data law, but the set of such 
values is smaller than the parameter space. Tradi¬ 
tionally, statistical inference has been restricted to 
the situation when parameters are identifiable. More 
recent research has considered methods for conduct¬ 
ing inference about partially identifiable parameters. 
This research has been motivated to some extent 
by methods to evaluate causal effects of treatment, 
which are frequently partially identifiable. For in¬ 
stance, causal estimands are typically only partially 
identifiable in observational studies where the treat¬ 
ment selection mechanism is not known to the ana¬ 
lyst. Noncompliance in randomized trials may also 
render treatment effects partially identifiable and a 
large amount of research has been devoted to draw¬ 
ing inference about treatment effects in the pres¬ 
ence of noncompliance. Partial identifiability also 
arises when drawing inference about treatment ef¬ 
fects within principal strata or effects describing re¬ 
lationships between an outcome and a treatment 
that are mediated by some intermediate variable. 

In order to conduct inference about treatment ef¬ 
fects that are partially identifiable, two approaches 
are often employed: (i) bounds are derived for the 
treatment effect under minimal assumptions, or (ii) 
additional untestable assumptions are invoked un¬ 
der which the treatment effect is identifiable and 
then sensitivity analysis is conducted to assess how 
inference about the treatment effect changes as the 
untestable assumptions are varied. Below (i) and (ii) 
are illustrated in five settings. In Section 2, we con¬ 
sider treatment effect bounds and sensitivity anal¬ 
ysis when the treatment assignment mechanism is 
unknown. In Section 3, partial identifiability of prin¬ 
cipal strata causal effects are discussed. In Section 4, 
the setting of noncompliance is considered where 
there is interest in assessing the effect of treatment if 
there was perfect compliance. In Section 5, bounds 
and sensitivity analysis for direct and indirect effects 
in mediation analysis are presented, and in Section 6 
longitudinal treatment effects are considered. Much 
of the literature on bounds and sensitivity analy¬ 
sis focuses on ignorance due to partial identifiability 
and tends to ignore uncertainty due to sampling er¬ 
ror. Section 7 presents some methods that appropri¬ 
ately quantify this uncertainty when drawing infer¬ 
ence about partially identifiable treatment effects. 
Section 8 concludes with a discussion. 


2. TREATMENT SELECTION 
2.1 Minimal Assumptions Bounds 

Suppose we have a random sample of individuals 
where each potentially receives treatment or control. 
Unless otherwise indicated, let Z indicate treatment 
received where Z = \ denotes treatment and Z = 0 
denotes control. Denote the observed outcome of in¬ 
terest by Y. In order to define a treatment effect 
on the outcome Y, we first define potential out¬ 
comes for an individual when receiving treatment, 
denoted U(l), and when receiving control, denoted 
y(0). Throughout this paper, we invoke the stable 
unit treatment value assumption (SUTVA; Rubin 
(1980)), that is, there is no interference between 
units and there are no hidden (unrepresented) forms 
of treatment such that each individual has two po¬ 
tential outcomes {Y (0), T(!)}• The no hidden forms 
of treatment guarantees that the observed outcome 
is equal to the potential outcome corresponding to 
the observed treatment, namely that Y = Y{z) for 
Z = z. Here, this will be referred to as causal consis¬ 
tency; for further discussion of causal consistency see 
Pearl (2010) and references therein. Once an indi¬ 
vidual receives treatment Z, the potential outcome 
Y(Z) is observed and the other potential outcome 
(or counterfactual) Y{1 — Z) becomes missing. As¬ 
sume that n i.i.d. copies of {Z,Y) are observed and 
denoted by (Zj, Yi) for i = 1,..., n. 

In this section, we consider treatment effect 
bounds when the treatment assignment mechanism 
is unknown. Here, Z can be thought of as treat¬ 
ment selection by the individual or by nature, rather 
than random treatment assignment as in an exper¬ 
iment. Define the average treatment effect ATE to 
be E[Y{1) - y(0)] = E[Y{1)] - E[y(0)] where E 
denotes the expected value. The ATE can be de¬ 
composed as 

1 

J]E[y(l)|Z = z]Pr[Z = z] 

2=0 

- ^E[y(0)|Z = z] Pr[Z = z]. 

2 = 0 

Note E[Y{z)\Z = z] = E[Y\Z = z] by causal consis¬ 
tency. Thus, from the observed data E[Y{z)\Z = z] 
and Pr[Z = z] are identifiable and can be consis¬ 
tently estimated by their empirical counterparts. On 
the other hand, the observed data provide no in¬ 
formation about E[Y{z)\Z = 1 — z], such that (1) 
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is only partially identifiable without additional as¬ 
sumptions. 

Bounds on E\Y (1) — Y (0)] can be obtained by en¬ 
tertaining the smallest and largest possible values 
for E\Y{z)\Z = 1 — z]. If Y{1) and 1^(0) are not 
bounded then bounds on E\Y{\) — 1^(0)] will be 
completely uninformative, ranging from —oo to oo. 
Thus, informative bounds are only possible if T(0) 
and Y{\) are bounded. Because any bounded vari¬ 
able can be rescaled to take values in the unit inter¬ 
val, without loss of generality assume Y(z) G [0,1] 
for z = 0,1. Then 0 < E[Y{z)\Z = 1 — z\<l and 
from (1) it follows that iii[y(l) — T(0)] is bounded 
below by setting E\Y (1)|Z = 0] = 0 and E\Y (0)|Z = 
1] = 1, which yields the lower bound 

E[Y{l)\Z = l] Pr[Z = l] 

(2) 

- E[Y{0)\Z = 0] Pr[Z = 0] -Pr[Z = 1]. 

Similarly, E\Y{1) — T(0)] is bounded above by set¬ 
ting E[Y{l)\Z = 0] = 1 and E[Y{{))\Z = 1] = 0, 
which yields the upper bound 

E[Y{l)\Z = l] Pr[Z = l] 

(3) 

- E[Y{{))\Z = {)] Pr[Z = 0] +Pr[Z = 0]. 

These bounds were derived independently by Robins 
(1989) and Manski (1990). The lower and upper 
bounds (2) and (3) are sharp in the sense that it 
is not possible to derive narrower bounds without 
additional assumptions. Note the interval formed by 
(2) and (3) is contained in [—1,1] and is of width 1. 
Thus, the bounds are informative in that the treat¬ 
ment effect is now restricted to half of the other¬ 
wise possible range [—1,1]. On the other hand, the 
bounds will always contain the null value 0 corre¬ 
sponding to no average treatment effect. That is, 
without additional assumptions the sign of the treat¬ 
ment effect cannot be determined from the observ¬ 
able data. 

2.2 Additional Assumptions 

The bounds (2)-(3) are sometimes called the “no 
assumptions” or “worst case” bounds because no as¬ 
sumptions are made about the effect of treatment 
in the population (Lee (2005); Morgan and Winship 
(2007)). The only assumptions made in deriving (2) 
and (3) are SUTVA and that the observed data con¬ 
stitute a random sample. If additional assumptions 
are invoked, the treatment effect bounds may be¬ 
come tighter (i.e., narrower) or even collapse to a 


point (i.e., the treatment effect may become identi¬ 
fiable) . Sometimes these additional assumptions will 
have implications that are testable based on the ob¬ 
served data. Should the observed data provide ev¬ 
idence against an assumption under consideration, 
then bounds should be computed without making 
this assumption. 

An example of an additional assumption is mean 
independence, that is, 

(4) E[Y{z)\Z = Q]=E[Y{z)\Z = l] for 2 = 0,1. 

Under (4) ATE is identifiable. Specifically the upper 
and lower bounds for ATE both equal E\Y{1)\Z = 
1] — E[Y{0)\Z = 0], which is identifiable from the 
observable data and can be consistently estimated 
by the “naive” estimator given by the difference in 
sample means between the groups of individuals re¬ 
ceiving treatment and control. Assumption (4) will 
hold in experiments where treatment is randomly 
assigned as in a randomized clinical trial. Moreover, 
in randomized experiments the stronger assumption 

(5) Y{z)UZ for z = 0,1, 

will hold, where II denotes independence. Indepen¬ 
dent treatment assignment (5) implies mean inde¬ 
pendence (4). 

In some settings it may be reasonable to consider 
additional assumptions that are not as strong as 
(4) or (5) but nonetheless lead to tighter bounds 
than (2) and (3). For example, monotonicity type 
assumptions might be considered, such as monotone 
treatment selection (MTS) 

(6) E[Y{z)\Z = 1] > E[Y{z)\Z = 0] for z = 0,l. 

MTS assumes individuals who select treatment will 
on average have outcomes greater than or equal to 
that of individuals who do not select treatment un¬ 
der the counterfactual scenario all individuals se¬ 
lected the same z. Manski and Pepper (2000) con¬ 
sider MTS when examining the effect of returning to 
school on wages later in life. For this example, MTS 
implies individuals who choose to return to school 
will have higher wages on average compared to in¬ 
dividuals who choose to not return to school under 
the counterfactual scenario no individuals return to 
school. Alternatively, one might assume monotone 
treatment response (MTR) 

Pr[y(l) >y(o)] = 1 

(Manski (1997)). MTR assu m es that under treat¬ 
ment each individual will have a response greater 
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than or equal to that under control. For instance, 
suppose Z = 1 if an individual elects to get the an¬ 
nual influenza vaccine and Z = 0 otherwise, and let 
Y{z) = 1 if an individual subsequently does not de¬ 
velop flu-like symptoms when Z = z, and Y{z) = 0 
otherwise. MTR asserts that each individual is more 
or as likely to not develop flu-like symptoms if 
they are vaccinated versus if they are unvaccinated. 
Given to date there is no evidence that the annual 
flu vaccine enhances the probability of acquiring in¬ 
fluenza, MTR might be plausible for this example. 

Assuming MTS or MTR can lead to narrower 
bounds than (2) and (3) because they imply addi¬ 
tional constraints on unobserved counterfactual ex¬ 
pectations. For example, assuming MTS, E[Y (0)|Z = 
1] is bounded below by E[Y (0)|Z = 0] and E[Y (1)|Z = 
0] is bounded above by E[Y (1)|Z = 1], implying the 
upper bound on E[Y (1) — Y (0)] is 

(7) ^[y(i)|z = i]-^[y(o)|z = o], 

for which the naive estimator is consistent. Un¬ 
der MTS, the lower bound remains (2). In contrast 
to the no assumptions bounds, assuming MTS the 
bounds may exclude 0, specifically when (7) is nega¬ 
tive. MTR implies U[y(l)] > Fl[y(0)] which in turn 
implies that the ATE lower bound is 0. Under MTR, 
the upper bound remains (3). 

2.3 AZT Example 

To illustrate the bounds above, consider a hypo¬ 
thetical study of 2000 HIV patients (from Figure 2 
of Robins (1989)) where 1400 individuals elected to 
take the drug AZT and 600 elected not to take AZT 
(this is a simplified version of the problem Robins 
considers). The outcome of interest is death or sur¬ 
vival at a given time point. Of the 2000 patients, 
1000 died with exactly 500 from each group. Let 
Z = 1 if the patient elected to take AZT and Z = 0 
otherwise; let V = 1 if the individual died and 0 
otherwise. The naive estimator, that is, the differ¬ 
ence in sample means between Z = 1 and Z = 0, 
equals 500/1400 — 500/600 « —0.48. The empirical 
estimates of the no assumptions bounds (2) and (3) 
equal —0.7 and 0.3. In this setting, the MTS as¬ 
sumption (6) supposes that individuals who elected 
to take AZT would have been more or as likely to die 
as individuals who did not take AZT in the coun¬ 
terfactual scenarios where everyone receives treat¬ 
ment or everyone does not receive treatment. This 
might be reasonable if it is thought that those who 
took AZT were on average less healthy than those 


who did not. Assuming MTS, the upper bound (7) 
is estimated to be —0.48. Thus, in this example 
the MTS bounds are substantially tighter than the 
no assumption bounds. The estimated MTS bounds 
lead to the conclusion (ignoring sampling variability, 
a point which we return to later) that AZT reduces 
the probability of death by at least 0.48 whereas 
without the MTS assumption we cannot even con¬ 
clude whether the effect of treatment is nonzero. 

2.4 Sensitivity Analysis 

Assumptions such as (4) or (5) which identify the 
ATE, or assumptions such as MTS which sharpen 
the bounds, cannot be tested empirically because 
such assumptions pertain to the counterfactual dis¬ 
tribution of Y(z) given Z = 1 — z. Robins and others 
(e.g., see Robins, Rotnitzky and Scharfstein (2000); 
Scharfstein, Rotnitzky and Robins (1999)) have ar¬ 
gued that a data analyst should conduct sensitivity 
analysis to explore how inference varies as a function 
of departures from any untestable assumptions. 

Eor instance, a departure from assumption (5) 
might be due to the existence of an unmeasured 
variable U associated with both treatment selection 
Z and the potential outcomes Y{z) for 2 ; = 0,1; a 
variable such as U is often referred to as an unmea¬ 
sured confounder. Under this scenario, one might 
postulate that Y{z)Tl Z\U for z = 0,1 rather than 
(5). Sensitivity analysis proceeds by examining how 
inference drawn about ATE varies as a function of 
the magnitude of the association of U with Z, U(0), 
and U(l). This idea has roots as early as Cornfield 
et al. (1959), who demonstrated the plausibility of a 
causal effect of cigarette smoking (Z) on lung cancer 
(y) by arguing that the absence of such a relation¬ 
ship was only possible if there existed an unmea¬ 
sured factor U associated with cigarette use that 
was at least as strongly associated with lung cancer 
as cigarette use. This idea was further developed by 
Schlesselman (1978), Rosenbaum and Rubin (1983), 
Lin, Psaty and Kronmal (1998), Hernan and Robins 
(1999) and VanderWeele and Arab (2011) among 
others. 

To illustrate this approach, suppose in the AZT 
example above that the analyst hrst assumes (5) 
holds, and thus estimates the effect of AZT to be 
—0.48. To proceed with sensitivity analysis, the an¬ 
alyst posits the existence of an unmeasured binary 
variable U and assumes that Y[z) 11Z\U for 2 : = 0,1. 
Similar to VanderWeele and Arah (2011), let 

c( 2 ) = {E[Y{z)\U = 1] - E[Y{z)\U = 0]} 

• {Pr[C/ = 1|Z = 2 ] - Pr[[/ = 1]}. 
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Then under the assumption that Y(z) \1 Z\U for 
z = 0,l, the naive estimator converges in proba¬ 
bility to £^[y(l)] — £l[y(0)] + c(l) — c(0). Thus the 
naive estimator is asymptotically unbiased if and 
only if c(l) = c(0). For an alternative decomposition 
of the asymptotic bias of the naive estimator, see 
Morgan and Winship (2007, Section 2.6.3). 

Sensitivity analysis proceeds by making varying 
assumptions about the unidentifiable associations of 
U with T(0), T(l) and Z. Under the most extreme 
of these assumptions, the bounds (2) and (3) are 
recovered. In particular, the upper bound in (3) is 
achieved when Pr[t/ = 1|Z = 1] = 0, Pt[U = 1\Z = 
0] = 1, E[Y{l)\U = 1] = 1 and E[Y{0)\U = 0] = 0, 
meaning that the confounder U is perfectly nega¬ 
tively correlated with treatment Z and that if the 
confounder is present {U = 1), then a treated indi¬ 
vidual will die, whereas if the confounder is absent 
(U = 0), then an untreated individual will survive. 
The lower bound (2) is achieved under the opposite 
conditions. 

In practice the extreme associations of U with 
y(0), Y{1), and Z leading to the bounds might 
be considered unrealistic. Instead the analyst might 
consider associations only in a range deemed plau¬ 
sible by subject matter experts. In order to ar¬ 
rive at an accurate range, care should be taken in 
communicating the meaning of these associations 
and eliciting this range should be done in a man¬ 
ner that avoids data driven choices. Alternatively, 
the degree of associations required to change the 
sign of the effect of interest might be determined. 
For instance, suppose the analyst further assumes 
that E[Y{z)\U = 1] — E[Y{z)\U = 0] does not de¬ 
pend on 2 ;. This assumption will hold if the ef¬ 
fect of Z on Y is the same if U = 0 or U = 1. 
Letting 70 = E[Y{z)\U = I] - E[Y{z)\U = 0] and 
7 i = Pr[17 = I|Z = I] — Pr[17 = 1|Z = 0], the asymp¬ 
totic bias of the naive estimator is then given by 7071 
and a bias adjusted estimator is found by subtract¬ 
ing 7 o 7 i from the naive estimator. Sensitivity analy¬ 
sis may proceed by determining the values of 70 and 
7 i for which the bias adjusted estimator of the ATE 
will have the opposite sign of the naive estimator. 
For the AZT example, the bias adjusted estimator 
will have the opposite sign of the naive estimator if 
7 o 7 i < — 0.48. This indicates that the product of (i) 
the difference in the mean potential outcomes be¬ 
tween levels of the confounder for both treatment 
and control, and (ii) the difference in the prevalence 


of the unmeasured confounder between the treat¬ 
ment and control groups must be less than —0.48. 
Such magnitudes might be considered unlikely in the 
opinion of subject matter experts, in which case the 
sensitivity analysis would support the existence of 
a beneficial effect of AZT on survival among HIV-I- 
men (ignoring sampling variability). Note the ob¬ 
served data distribution places some restrictions on 
the possible values of ( 70 , 71 ), that is, ( 70 , 71 ) is 
partially identihable. For instance, if 71 = 1 then 
Pr[t/ = 1|Z = I] = 1 and Pr[f7 = I|Z = 0] = 0 which 
implies E[Y{z)\U = u] = E\Y{z)\Z = u] and, there¬ 
fore, max{E[y(l)|Z = 1] - 1,-E[Y{0)\Z = 0]} < 
70 < min{E[y(l)|Z = 1], 1 - E[Y{0)\Z = 0]}. Such 
considerations should be taken into account when 
determining the range of values of ( 70 , 71 ) in sensi¬ 
tivity analysis. 

Because the data provide no evidence about 
U, VanderWeele (2008) and VanderWeele and Arah 
(2011) recommend choosing U and any simplifying 
assumptions based on what is considered plausible 
by relevant subject-matter experts. Such sensitivity 
analyses are most applicable when the existence of 
unmeasured confounders is known, but these factors 
could not be measured for logistical or other reasons. 
General bias formulas to be used for sensitivity anal¬ 
yses of unmeasured confounding for categorical or 
continuous outcomes, confounders and treatments 
can be found in VanderWeele and Arah (2011). 

In other settings, there might not be any known 
unmeasured confounders, or it may be thought 
that there are numerous unmeasured confounders, 
in which cases the sensitivity analysis strategy de¬ 
scribed above would not be applicable or feasi¬ 
ble. One general alternative approach entails mak¬ 
ing additional untestable assumptions regarding the 
unobserved potential outcome distributions. Typi¬ 
cally, these assumptions (or models) are indexed by 
one or more sensitivity analysis parameters condi¬ 
tional upon which the causal estimand of interest is 
identifiable (e.g., Scharfstein, Rotnitzky and Robins 
(1999); Brumback et al. (2004)). Sensitivity analysis 
then proceeds by examining how inference changes 
as assumed values of the parameters are varied over 
plausible ranges. Examples of such sensitivity anal¬ 
yses are given below in Sections 3.4 and 6.3. 

2.5 Covariate Adjustment 

Typically in observational studies baseline (pre¬ 
treatment) covariates X will be collected in addi¬ 
tion to Z and Y. Incorporating information from ob¬ 
served covariates can help sharpen inferences about 
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partially identified treatment effects. For example, 
incorporating covariates will generally lead to nar¬ 
rower bounds (Scharfstein, Rotnitzky and Robins 
(1999)). This follows because any treatment effect 
compatible with the distribution of observed vari¬ 
ables {X,Y,Z) must also be compatible with the 
distribution of (Y,Z), that is, the observable vari¬ 
ables if we do not observe or choose to ignore X 
(Lee (2009)). Covariate adjusted bounds are dis¬ 
cussed further in Section 3.3 below. 

Additionally, incorporating covariates may lend 
plausibility to some of the bounding assumptions 
discussed in Section 2.2. For example, in the absence 
of randomized treatment assignment (4) or (5) may 
be dubious. Instead of (4), it might be more plausi¬ 
ble to assume 

E[Y(z)\Z = 0,X = x] 

( 8 ) 

= E[Y{z)\Z=l,X = x] for z = 0,1. 

Similarly, assumption (5) might be replaced by 

(9) Y{z)UZ\X for z = 0,1, 

that is, each potential outcome is independent of 
treatment selection conditional on some set of co¬ 
variates. Assumption (9) is commonly referred to as 
no unmeasured confounders. Assumptions such as 
(8) or weaker inequalities similar to (6) such as 

E[Y{z)\Z = l,X = x] 

>E[Y{z)\Z = {),X = x] for ^ = 0,1, 

may be deemed plausible for certain levels of AC, but 
not for others. Availability of covariates also allows 
for the consideration of new types of assumptions 
(e.g., see Chiburis (2010)). 

To conduct covariate adjusted sensitivity analysis, 
departures from identifying assumptions such as (9) 
can be explored. Similar to the previous section, a 
departure from (9) might entail positing the exis¬ 
tence of an unmeasured variable U associated with 
both treatment selection Z and the potential out¬ 
comes Y{z) for ^; = 0,1. Under this scenario, one 
might postulate that Y{z)\l Z\{X^U} for z = 0,l 
rather than (9) and sensitivity analysis proceeds by 
examining how inference varies as a function of the 
magnitude of the association of U with Z, Y (0), and 
y(l) given X. Similar to covariate adjusted bounds, 
smaller associations or tighter regions of the values 
of the sensitivity parameters may be deemed plau¬ 
sible within certain levels of X, potentially yield¬ 
ing sharper inferences from the sensitivity analyses. 


However, as cautioned by Robins (2002), care should 
be taken in clearly communicating the meaning of 
such sensitivity parameters and their relationship to 
covariates when eliciting plausible ranges from sub¬ 
ject matter experts. In some scenarios, plausible re¬ 
gions for sensitivity parameters may in fact be wider 
when conditioning on X than when not conditioning 
on X. 

3. PRINCIPAL STRATIFICATION 
3.1 Background 

Even if treatment is randomly assigned (e.g., as 
in a clinical trial), the causal estimand of interest 
may still be only partially identifiable. For exam¬ 
ple, in many studies it is often of interest to draw 
inference about treatment effects on outcomes that 
only exist or are meaningful after the occurrence of 
some observable intermediate variable. For instance, 
in studies where some individuals die, investigators 
might be interested in treatment effects only among 
individuals alive at the end of the study. Unfortu¬ 
nately, estimands dehned by contrasting mean out¬ 
comes under treatment and control that simply con¬ 
dition on this observable intermediate variable do 
not measure a causal effect of treatment without 
additional assumptions. One approach that may be 
employed in this scenario entails principal stratifica¬ 
tion (Frangakis and Rubin (2002)). Principal strati¬ 
fication uses the potential outcomes of the interme¬ 
diate post-randomization variable to define strata 
of individuals. Because these “principal strata” are 
not affected by treatment assignment, treatment ef¬ 
fect estimands defined within principal strata have 
a causal interpretation and do not suffer from the 
complications of standard post-randomization ad¬ 
justed estimands. The simple framework of principal 
stratihcation has a wide range of applications. For 
a recent discussion of the utility (and lack thereof) 
of principal stratification, see Pearl (2011) and cor¬ 
responding reader reactions. 

As a motivating example for this section, we con¬ 
sider evaluating vaccine effects on post-infection 
outcomes. In vaccine studies, uninfected subjects 
are enrolled and followed for infection endpoints, 
and infected subjects are subsequently followed for 
post-infection outcomes such as disease severity or 
death due to infection with the pathogen targeted 
by the vaccine; often interest is in assessing the 
effect of vaccination on these post-infection end¬ 
points (Hudgens and Halloran (2006)). For exam¬ 
ple, Prraiosi and Halloran (2003) present data from 
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a pertussis vaccine field study in Niakhar, Senegal. 
In this study, 3845 vaccinated children and 1020 un¬ 
vaccinated children were followed for one year for 
pertussis. In the vaccine group, 548 children con¬ 
tracted pertussis, of whom 176 had severe infec¬ 
tions; in the unvaccinated group 206 children con¬ 
tracted pertussis, of whom 129 had severe infections. 
In this setting, investigators are interested in assess¬ 
ing whether or not the vaccine had an effect on the 
severity of infection. 

When assessing such post-infection effects, a data 
analyst might consider contrasts between study 
arms including all individuals under study, or, alter¬ 
natively, only those who become infected. Though 
including all individuals in the study has the advan¬ 
tage of providing valid inference about the overall ef¬ 
fect of vaccination (assuming independent treatment 
assignment), such an approach does not distinguish 
vaccine effects on susceptibility to infection from ef¬ 
fects on the post-infection endpoint of interest. An 
analysis that conditions on infection attempts to 
distinguish these effects and may be more sensitive 
in detecting post-infection vaccine effects. However, 
because the set of individuals who would become 
infected under control are not likely to be the same 
as those who would become infected if given the 
vaccine, conditioning on infection might result in se¬ 
lection bias. For example, those who would become 
infected under vaccine may tend to have weaker im¬ 
mune systems than those who would become in¬ 
fected under control, and thus may be more suscep¬ 
tible to severe infection. Because of this potential 
selection bias, comparisons between infected vacci- 
nees and infected controls do not necessarily have 
causal interpretations. 

3.2 Principal Effects 

In this section, treatment is vaccination, with 
Z = 1 corresponding to vaccination and Z = 0 cor¬ 
responding to not being vaccinated. Assume that 
assignment to vaccine is equivalent to receipt of vac¬ 
cine, that is, there is no noncompliance. Denote the 
potential infection outcome by S{z), where S{z) = 0 
if uninfected and 5(2:) = 1 if infected. Here, the focus 
is on evaluating the causal effect of vaccine on H, a 
post-infection outcome. For simplicity, we consider 
the case where Y is binary, indicating the presence 
of severe disease. If 5 ( 2 ) = 1, define the potential 
post-infection outcome Y(z) = 1 if the individual 
would have the worse (or more severe) post-infection 


outcome of interest given z, and Y{z) = 0 other¬ 
wise. If an individual’s potential infection outcome 
for treatment z is uninfected [i.e., S{z) =0], then 
we adopt the convention that Y[z) is undefined. In 
other words, it does not make sense to define the 
severity of an infection in an individual who is not 
infected. This convention is similar to that employed 
in other settings. For instance, in the analysis of 
quality of life studies it might be assumed that qual¬ 
ity of life metrics are not well defined in those who 
are not alive (Rubin (2000)). 

Define a basic principal stratification Pq accord¬ 
ing to the joint potential infection outcomes = 
(5(0),5(1)). The four basic principal strata or re¬ 
sponse types are defined by the joint potential infec¬ 
tion outcomes, (5(0), 5(1)), and are composed of im¬ 
mune (not infected under both vaccine and placebo), 
harmed (infected under vaccine but not placebo), 
protected (infected under placebo but not vaccine), 
and doomed individuals (infected under both vac¬ 
cine and placebo). Note the only stratum where both 
potential post-infection endpoints are well defined is 
in the doomed basic principal stratum, = (1,1). 
Thus, defining a post-infection causal vaccine effect 
is only possible in the doomed principal stratum 
5^0 = (1,1). Such a causal estimand will describe 
the effect of vaccination on disease severity in in¬ 
dividuals who would become infected whether vac¬ 
cinated or not. For instance, the vaccine effect on 
disease severity may be defined by 


( 10 ) 


H[y(i)|5^° = (i,i)] 
-H[y(0)|5^° = (l,l)]. 


Frangakis and Rubin call treatment effect estimands 
such as (10) “principal effects.” 


3.3 Bounds 


Assume we observe n i.i.d. copies of (Z,S,Y) 
denoted by {Zi,Si,Yi) for i = l,...,n. Also as¬ 
sume that the doomed principal strata is nonempty, 
Pr[5^o = (1,1)] > 0, so that the principal effect in 

(10) is well defined. Bounds for (10) are presented 
below under two additional assumptions: indepen¬ 
dent treatment assignment, that is, 

(11) Zn{Y{z),S{z)] for 2 = 0,1 

and monotone treatment response with respect to 
5, that is, 

( 12 ) 


Pr[5(0) >5(1)1 = 1. 
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Assumption (11) will hold in randomized vaccine tri¬ 
als. Monotonicity (12) assumes that the vaccine does 
no harm at the individual level, that is, there are no 
individuals who would be infected if vaccinated but 
uninfected if not vaccinated. Monotonicity is equiv¬ 
alent to assuming the harmed principal stratum is 
empty. Note no such monotonicity assumption is be¬ 
ing made regarding Y. Under (11), assumption (12) 
implies P{S = 1\Z = 1) < P{S = 1\Z = 0), which is 
testable using the observed data. For the pertus¬ 
sis example, the proportion infected in the vaccine 
group was less than in the unvaccinated group; thus, 
assuming (11), the data do not provide evidence 
against (12). 

Assuming independent treatment assignment and 
monotonicity, (10) is partially identifiable from the 
observable data. The left term of (10) can be written 

F;[y(i)|5^° = (i,i)] 

= E[y(i)|5(i) = i] 

(13) 

= E[Y{l)\S{l) = l,Z = l] 

= E[Y\S = 1,Z = 1], 

where the first equality holds under (12), the second 
equality under (11), and the third by causal consis¬ 
tency. On the other hand, the right term of (10) is 
only partially identifiable. To see this, note 

E[YmS{0) = l] 

(14) = £;[y(0)|5'^° = (1,1)] Pr[5(l) = 1|5(0) = 1] 

+ F;[y(0)|5'f'° = (1,0)] Pr[S(l) = 0|S(0) = 1]. 

In (14), only U[y(0)|5(0) = 1] and Pr[5(l) = s| 
5(0) = 1] for s = 0,1 are identifiable. In particu¬ 
lar, F;[y(0)|5(0) = 1] = E[Y\S = 1, Z = 0] by similar 
reasoning to (13), and 

Pr[5(l) = l|5(0) = l] 

Pr[5(l) = l] Pr[5 = l|Z = l] 

“ Pr[5(0) = 1] “ Pr[5 = IjZ = 0] ’ 

where the first equality holds under (12) and the 
second under independent treatment assignment 
(and causal consistency). The other two terms in 
(14), namely F;[y(0)|5^° = (1,1)] and F;[y(0)|5^o = 
(1,0)], are only partially identifiable. In words, in¬ 
fected controls are a mixture of individuals in the 
protected and doomed principal stratum and with¬ 
out further assumptions the observed data do not 
identify exactly which infected controls are doomed. 
Therefore, the probability of severe disease when not 


vaccinated in the doomed principal stratum is not 
identified. Under (12), the data do however indicate 
what proportion of infected controls are doomed 
and this information provides partial identification 
of £'[y(0)15^° = (1,1)], and hence (10). 

For fixed values of E[Y (0)15(0) = 1] and Pr[5(l) = 
115(0) = 1], any pair of expectations (£'[y(0)15^° = 
(1,1)1, F;[y(0)15^0 = (1,0)1) e [0,1]^ satisfying (14) 
will give rise to the same observed data distribution. 
Equation (14) describes a line segment with nonpos¬ 
itive slope intersecting the unit square as illustrated 
in Figure 1. An upper bound of E[Y (0)|5^° = (1,1)] 
and thus a lower bound for (10) is achieved when the 
line intersects the right or lower side of the square, 
that is, when either 

E[y(0)|5-^o = (1,1)1 = 1 or 

(15) 

E[y(0)|5-^o = (1,0)1 =0. 

Together (14) and (15) imply E[y(0)|5'^° = (1,1)] 
is bounded above by 


(16) 


min< 1 


F;[y(o)|5(o) = i] ] 

Pr[5(l) = l|5(0) = l]/- 


Similarly, U[y(0)|5'^‘’ = (1,1)] is bounded below by 


max|o, 

E[y(0)|5(0) = 1] - Pr[5(l) = 0|5(0) = 1] 
Pr[5(l) = l|5(0) = l] 

Combining (17) with (13) yields the upper bound on 
the principal effect of interest (10) and combining 
(16) with (13) yields the lower bound. These bounds 
were derived by Rotnitzky and Jemiai (2003), Zhang 
and Rubin (2003) and Hudgens, Hoering and Self 
(2003). Consistent estimates of (16) and (17) can 
be computed by replacing E[y(0)|5(0) = 1] with 
EiYiliSi = l,Zi = 0)/ZiHS^ = l,Zi = 0) and 
Pr[5(l) = 1|5(0) = 1] with 


min< 1, 


Y:^I(.Si = z^ = l)/Y:^HZi = l) \ 

EiIis^ = hz^ = o)/E^Iiz^ = o)S^ 


Returning to the pertussis vaccine study, the esti¬ 
mated lower and upper bounds of (10) are —0.57 and 
—0.15. These estimated bounds exclude zero, lead¬ 
ing to the conclusion (ignoring sampling variability) 
that vaccination lowers the risk of severe pertussis 
in individuals who will become infected regardless 
of whether they are vaccinated. 

Note if Pr[5(l) = 1|5(0) = 1] = 1, that is, the 
vaccine has no protective effect against infection. 
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then the protected principal stratum = (1,0) is 
empty and both (16) and (17) equal £i[y(0)|5(0) = 
1 ] meaning that (10) is identihable and equals 
E[Y\Z = l,S = l]- E[Y\Z = 0,S = l]. Intuitively, 
the lack of vaccine effect against infection eliminates 
the potential for selection bias. 

As discussed in Section 2.5, incorporation of co¬ 
variates can tighten bounds. For covariates X with 
finite support, one simple approach of adjusting 
for covariates entails determining bounds within 
strata defined by the levels of X and then taking 
a weighted average of the within strata bounds over 
the distribution of X. For the bounds in (16) and 

(17) , adjustment for covariates will always lead to 
bounds that are at least as tight as bounds unad¬ 
justed for covariates (Lee (2009); Long and Hudgens 
(2013)). 

If the observed data provide evidence contrary 
to monotonicity (12), then bounds may be ob¬ 
tained under only (11). Without monotonicity (12), 
the proportion of infected controls that are in the 
doomed principal stratum is no longer identified but 
may be bounded in order to arrive at bounds for 
£'[y(0)|S'^° = (1,1)]. In addition, the harmed prin¬ 
cipal stratum defined by = (0,1) is no longer 
empty and thus H[y(l)|5^° = (1,1)] is no longer 
identifiable from the observed data and may also be 
bounded in a similar fashion to £'[y(0)|S'^° = (1,1)]. 
Details regarding these bounds without the mono¬ 
tonicity assumption may be found in Zhang and Ru¬ 
bin (2003) and Grilli and Mealli (2008). 

3.4 Sensitivity Analysis 

The bounds (16) and (17) are useful in bound¬ 
ing the vaccine effect on Y in the doomed stratum. 
However, these bounds may be rather extreme. An 
alternative approach is to make an untestable as¬ 
sumption that identihes the post-infection vaccine 
effect on Y and then consider how sensitive the re¬ 
sulting inference is to departures from this assump¬ 
tion. For instance, assuming 

Pr[y(0) = l|5^° = (l,l)] 

(18) 

= Pr[y(0) = l|5^° = (l,0)], 

identifies (10). Hudgens and Halloran (2006) refer 
to this as the no selection model. To examine how 
inference varies according to departures from (18), 
following Scharfstein, Rotnitzky and Robins (1999), 


and Robins, Rotnitzky and Scharfstein (2000), con¬ 
sider the following sensitivity parameter: 


exp( 7 ) = (Pr[y(0) = l|5'^° = (l,l)] 

/Pr[y(0) = 0|5'^° = (l,l)]) 

(19) 

•(Pr[y(0) = l|5'^° = (l,0)] 

/Pr[y(0) = 0|5'^° = (l,0)])-'. 


In words, exp( 7 ) compares the odds of severe dis¬ 
ease when not vaccinated in the doomed versus the 
protected principal stratum. Assuming (18) corre¬ 
sponds to 7 = 0. A sensitivity analysis entails ex¬ 
amining how inference about ( 10 ) varies as 7 be¬ 
comes farther from 0. For any fixed value of 7 , 
(10) is identihed (see Figure 1) and can be consis¬ 
tently estimated by maximum likelihood estimation 
without any additional assumptions (Gilbert, Bosch 
and Hudgens (2003)). The lower and upper bounds 
(17) and (16) are obtained by letting 7 —)• 00 and 
7 —)■ — 00 . To see this, note that as 7 —>• 00 (19) im¬ 
plies in the limit that either 

Pr[y(0) = l|5'^“ = (l,l)] = l or 

Pr[y(0) = l|5'^« = (l,0)] = 0, 


which is equivalent to (15). Sensitivity analysis can 
be conducted by letting 7 range over a set of val¬ 
ues F. 

Tighter bounds can be achieved by placing restric¬ 
tions on F, perhaps based on prior beliefs about 7 
elicited from subject matter experts. For example. 
Shepherd, Gilbert and Mehrotra (2007) surveyed 10 
recognized HIV experts in order to elicit a plausible 
range for a sensitivity parameter representing a de¬ 
parture from the assumption of no selection bias be¬ 
tween vaccinated and unvaccinated individuals who 
acquired HIV during an HIV vaccine trial. Included 
in this survey was the analysis approach, a brief ex¬ 
planation of the potential for selection bias, the def¬ 
inition of the sensitivity parameter being employed, 
examples of the implications of certain sensitivity 
parameter values on selection bias and possible jus- 
tihcation for believing certain values of the sensitiv¬ 
ity parameter. The expert responses to the survey 
were fairly consistent and several written justifica¬ 
tions for the respondents’ chosen ranges indicated 
a high level of understanding of both the counter- 
factual nature of the sensitivity parameter and the 
need to account for selection bias. 
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4. RANDOMIZED STUDIES WITH PARTIAL 
COMPLIANCE 

4.1 Global Average Treatment Effect 

In a placebo controlled randomized trial where (5) 
holds but there is non-compliance (i.e., individuals 
are randomly assigned to treatment or control but 
they do not necessarily adhere or comply with their 
assigned treatment), the naive estimator is a con¬ 
sistent estimator of the average effect of treatment 
assignment. However, in this case parameters other 
than the effect of treatment assignment may be of 
interest. As in the last section, a principal effect 
may be defined using compliance as the intermedi¬ 
ate post-randomization variable over which to define 
principal strata; namely the principal strata would 
consist of individuals who would comply with their 
randomization assignment if assigned treatment or 
control or “compilers,” individuals who would al¬ 
ways take treatment regardless of randomization or 
“always takers,” individuals who never take treat¬ 
ment “never takers” and individuals who take treat¬ 
ment only if assigned control or “defiers.” A princi¬ 
pal effect of interest might be the effect of treatment 
in the compiler principal stratum (Imbens and An- 
grist (1994); Angrist, Imbens and Rubin (1996)), in 
which case bounds and sensitivity analyses similar 
to those in Section 3 are applicable. However, as 
several authors including Robins (1989) and Robins 
and Greenland (1996) have pointed out, such prin¬ 
cipal effects may not be of ultimate public health in¬ 
terest because they only apply to the subpopulation 
of compilers in clinical trials, which may differ from 
the population that elect to take treatment once li¬ 
censed. For example, once efficacy is proved, a larger 
subpopulation of people may be willing to take the 
treatment. Effects defined on the subpopulation of 
compilers are also of limited decision-making utility 
because individual principal stratum membership is 
generally unknown prior to treatment assignment 
(Joffe (2011)). 

Robins and Greenland (1996) suggested that in 
settings where the trial population could be per¬ 
suaded to take the treatment once licensed, a more 
relevant public health estimand is the global aver¬ 
age treatment effect, defined as the average effect of 
actually taking treatment versus not taking treat¬ 
ment given treatment assignment z. This causal es¬ 
timand is similar to the average treatment effect 
defined in Section 2, but requires generalizing the 


potential outcome definitions used previously to in¬ 
clude separate potential outcomes for each of the 
four combinations of treatment assignment and ac¬ 
tual treatment received. For further discussion re¬ 
garding causal models in presence of noncompliance, 
see Ghickering and Pearl (1996) and Dawid (2003) 
among others. 

Suppose we observe data from a clinical trial 
where each individual is randomly assigned to treat¬ 
ment or control. Let Z indicate treatment assign¬ 
ment where Z = 1 denotes treatment and Z = 0 de¬ 
notes control. Suppose individuals do not necessarily 
comply with their randomization assignment and let 
S' be a variable indicating whether or not treatment 
was actually taken, where S = 1 denotes treatment 
was taken and S = 0 otherwise. Thus, an individual 
is compliant with their randomization assignment 
if S = Z. Let y be a binary outcome of interest. 
Denote the potential treatment taken by S{z) for 
z = 0,1, where S( 2 :) = 1 indicates taking treatment 
when assigned z and S( 2 ;) = 0 denotes not taking 
treatment when assigned z. Let Y{z,s) denote the 
potential outcome if an individual is assigned treat¬ 
ment z but actually takes treatment s. Conceiving 
of these potential outcomes depends on a suppo¬ 
sition that trial participants who did not comply 
in the trial could be persuaded to take the treat¬ 
ment under other circumstances. Given this suppo¬ 
sition, the global average treatment effect for each 
treatment assignment z = 1 and z = 0 is defined as 
GATEx = E[Y(z, 1) — y (z, 0)]. For instance, GATEi 
is the difference in the average outcomes under the 
counterfactual scenario everyone was assigned vac¬ 
cine and did comply versus the counterfactual sce¬ 
nario everyone was assigned vaccine but did not 
comply. 

Bounds for GATE^ are given below under three 
assumptions: independent treatment assignment 

ZH{S(0),5(l),y(0,0), 

( 20 ) 

y(o,i),y(i,o),y(i,i)}; 

monotonicity with respect to S 

(21) Pr[5(l)>5(0)] = l; 
and the exclusion restriction 

(22) y(0,s) = y(l,s) for s = 0,1. 

Assumption (22) indicates treatment assignment 
has no effect when the actual treatment taken is 
held hxed. Under (22), GATEq = GATEi which we 
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denote by GATE. In this case each individual has 
two potential outcomes according to s = 0 and s = 1 
[which could be denoted by Y(s) =Y (0, s) = Y (1, s) 
for s = 0,1] and GATE is equivalent to the ATE 
discussed in Section 2 with 2 : replaced by s. Robins 
(1989) derived bounds for GATE under several dif¬ 
ferent combinations of (20)-(22) as well as some ad¬ 
ditional assumptions such as monotonicity with re¬ 
spect to S, that is, Y(z,l) >Y{z,0) for 2 = 0,1. 
Manski (1990) independently derived related re¬ 
sults. Under (20)-(22), the sharp lower and upper 
bounds on GATE are 

— 1 -|- max{Pr[y = 1, S' = 1|Z = z]} 

(23) 

-|- max{Pr[y = 0, S = 0\Z = z]} 

Z 

and 

1 — max{Pr[y = 0,S = l|y = 2 ]} 

(24) 

- max{Pr[y = 1, S = OjZ = z]}. 

Z 

Balke and Pearl (1997) derived sharp bounds for 
GATE under a variety of assumptions, including 
(20)-(22), by recognizing that the derivation of the 
bounds is equivalent to a linear programming op¬ 
timization problem. To see that bounds can be for¬ 
mulated as a linear programming optimization prob¬ 
lem, first note that GATE can be expressed as a lin¬ 
ear combination of probabilities of the joint distri¬ 
bution of L = (y(o,o),y(o,i),y(i,o),y(i,i),s(o), 
^(1)) 

(25) Pr[L = /i]- Pr[L = /o], 

li£Ci lo&Co 

where Cg is the set of possible realizations of L where 
y(0, s) = Y (1, s) = 1 for s = 0,1. Under independent 
treatment assignment, there exists a linear transfor¬ 
mation between the probabilities in the joint distri¬ 
bution of L and the probabilities in the conditional 
distribution of the observable random variables Y 
and S given Z, namely 

(26) PT[Y = y,S = s\Z = z]= ^ Pr[L = /], 

iGOys-Z 

where Oys-z is the set of possible realizations of L 
where S{z) = s and Y ( 2 , s) = y for z,y, s = 0,1. To 
find the sharp bounds, the objective function (25) 
is minimized (or maximized) subject to the con¬ 
straints (26), Pr[L = /] > 0 for every I G C, and 


^^g£Pr[L = /] = 1 where £ is the set of all possi¬ 
ble realizations of L assuming (21) and (22). Opti¬ 
mization may be accomplished using the simplex al¬ 
gorithm and the dimension of this problem permits 
obtaining a closed form solution involving probabil¬ 
ities of the observed data distribution (Balke and 
Pearl (1993)), namely (23) and (24). 

If in addition to assumptions (20) and (22), it is 
assumed that 

E[Y{z, 1) - y( 2 , 0)\Z = l,S = s] 

(27) 

= E[Y{z, 1) - y( 2 , 0)\Z = 0,S = s] 

for s, 2 = 0,1 then GATE is identified and equals 

.2.^ E[Y\Z = 1]-E[Y\Z = 0] 

^ ^ E[S\Z = 1]-E[S\Z = 0] 

(Hernan and Robins (2006)). For s = 0 assumption 
(27) is known as a no current treatment interaction 
assumption (Robins (1994)), and expression (28) is 
known as the instrumental variables estimand (Im- 
bens and Angrist (1994); Angrist, Imbens and Ru¬ 
bin (1996)). Sensitivity analyses may be conducted 
by defining sensitivity parameters representing de¬ 
partures from (20), (22) or (27) and then examin¬ 
ing how inference about GATE varies as values of 
these parameters change. For instance, Robins, Rot- 
nitzky and Scharfstein (2000) define current treat¬ 
ment interaction functions which represent a depar¬ 
ture from (27) for s = 0. 

4.2 Cholestyramine Example 

To illustrate the GATE, we consider data pre¬ 
sented in Pearl (2009, Section 8.2.6) on 337 sub¬ 
jects who participated in a randomized trial to as¬ 
sess the effect of cholestyramine on cholesterol re¬ 
duction. Let Z = 1 denote assignment to cholestyra¬ 
mine and Z = 0 assignment to placebo. Let S' = 1 
if cholestyramine was actually taken by the partic¬ 
ipant and S = 0 otherwise. Let y = 1 if the par¬ 
ticipant had a response and y = 0 otherwise, where 
response is defined as reduction in the level of choles¬ 
terol by 28 units or more. Pearl reported the follow¬ 
ing observed proportions: 

Pr[y = 0, s = o|y = 0] = 0 . 919 , 

Pr[y = 0,S = 0|Z = l] =0.315, 

Pr[y = 0, S = 1|Z = 0] = 0.000, 

Pr[y = 0,S = l|Z = l] =0.139, 

Pr[y = l,S = 0|Z = 0] =0.081, 
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Pr[y = l,5 = 0|Z = l] =0.073, 

Pr[y = 1,5' = 1|Z = 0] =0.000, 

Pr[y = l,5 = l|Z = l] =0.473. 

No participants assigned placebo actually took 
cholestyramine, suggesting the monotonicity as¬ 
sumption (21) is reasonable. On the other hand, 
38.8% of individuals assigned treatment did not ac¬ 
tually take cholestyramine. 

From (23) and (24), the bounds on GATE as¬ 
suming (21), (20) and (22) are estimated to be 
-1 + max{0.000,0.473} -h max{0.919,0.315} = 0.392 
and 1 — max{0,0.139} — max{0.081,0.073} = 0.780. 
The positive sign of the estimated bounds indicates 
the treatment is beneficial. Pearl interprets the es¬ 
timated bounds as follows: “although 38.8% of the 
subjects deviated from their treatment protocol, the 
experimenter can categorically state that, when ap¬ 
plied uniformly to the population, the treatment is 
guaranteed to increase by at least 39.2% the proba¬ 
bility of reducing the level of cholesterol by 28 points 
or more.” Such an interpretation does not account 
for sampling variability, the topic of Section 7. 

5. MEDIATION ANALYSIS 
5.1 Natural Direct and Indirect Effects 

As demonstrated in Sections 3 and 4, independent 
treatment assignment does not guarantee that the 
causal estimand of interest will be identifiable. An¬ 
other setting where this occurs is in mediation anal¬ 
ysis, where researchers are interested in whether or 
not the effect of a treatment is mediated by some 
intermediate variable. Even in studies where treat¬ 
ment is assigned randomly and there is perfect com¬ 
pliance, confounding may exist between the interme¬ 
diate variable and the outcome of interest such that 
effects describing the mediated relationships will not 
in general be identifiable. Thus, bounds and sensi¬ 
tivity analysis may be helpful in drawing inference. 

To illustrate, let V be an observed binary out¬ 
come of interest, and S a binary intermediate vari¬ 
able observed some time between treatment assign¬ 
ment Z and the observation of Y. The goal is to 
assess whether and to what extent the effect of Z 
on y is mediated by or through S. Denote the po¬ 
tential outcome of the intermediate variable under 
treatment z by S{z) for 2 : = 0,1 such that S = S{Z), 
and the potential outcomes under treatment 2 : and 
intermediate s as y( 2 ;, s) such that Y = Y{Z,S{Z)). 


Here, as in the previous section, it is assumed that 
both Z and S can be set to particular fixed val¬ 
ues, such that there are four potential outcomes for 
y per individual. Unless otherwise specified, inde¬ 
pendent treatment assignment (20) will be assumed 
throughout this section. 

Define the total effect of treatment to be E[y(l, 
5(1)) — y(0,5(0))], which is equivalent to the ATE 
defined in Section 2.1. The total effect of treatment 
can be decomposed in the following way: 

E[y(i,5(i))-y(o,5(o))] 

(29) =E[y(l,5(2))-y(0,5(2))] 

+ E[Y{z',S{l))-Y{z',Sm 

for 2 ; = 0,1 and 2 ;' = 1 — 2 ;. The right-hand side of 
(29) decomposes the total effect into the sum of 
two separate effects. The first expectation on the 
right-hand side of (29) is the natural direct effect 
for treatment z, NDE^ = E[Y (1, 5 ( 2 ;)) — Y (0, 5 ( 2 ;))] 
(Robins and Greenland (1992); Pearl (2001); Robins 
(2003); Kaufman, Kaufman and MacLehose (2009); 
Robins and Richardson (2010)). The natural di¬ 
rect effect is the average effect of the treatment 
on the outcome when the intermediate variable is 
set to the potential value that would occur under 
treatment assignment 2 ;. The second expectation 
on the right-hand side of (29) is the natural indi¬ 
rect effect, NIE^ = E[Y ( 2 ;, 5(1)) — Y ( 2 ;, 5(0))] (Pearl 
(2001); Robins (2003); Imai, Keele and Yamamoto 
(2010)). The natural indirect effect is the difference 
in the average outcomes when treatment is set to 2 ; 
and the intermediate variable is set to the value that 
would have occurred under treatment compared to if 
the intermediate variable were set to the value that 
would have occurred under control. 

Though the total effect is identifiable assuming 
(20), the natural direct and indirect effects are not 
identifiable since they entail E\Y{z, 5(1 — 2 ;))] which 
depends on unobserved counterfactual distributions. 
Sjolander (2009) derived bounds for the natural di¬ 
rect effects assuming only independent treatment 
assignment (20) using the linear programming tech¬ 
nique of Balke and Pearl (1997). This results in the 
following sharp lower and upper bounds for NDEq 
and NDEi: 

{ -Pll-O-PlO-0, 1 

Pll-1 +P 010 - 1 -PlO-0, > 

PlO l +P00-0 - 1 -pil-0 I 

(30) 
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< NDEq < min 


POl-0 +P00-0, 

1 — Pool + Poi-o — Pio-o, 
1 -POl-l +P00 0 -Pii-o 


max 

(31) 


—Poll — Poo-i, 

Pooo — 1 — POM +P 10 T, 
POl-O — 1 — Pool +P 11-1 


< NDEi < min 


Put +P10-1, 

1 — Poll +P 10-1 — Pll-Oj 
1 — Pool +P 11-1 — Pio-o 


where Pys-z = Pr(E = y,S = s\Z = z). These bounds 
may exclude 0, indicating a natural direct effect of 
treatment z when the intermediate variable is set to 
S{z) (ignoring sampling variability). There are in¬ 
stances where the bounds in (30) and (31) may col¬ 
lapse to a single point, for example, if pio o =Pio i = 
1. Using (29), bounds for NIEq and NIEi can be 
obtained by subtracting the bounds for NDEi and 
NDEq from the total effect, which is identified under 
(20) and equal to (piM -Lpio i) - (pio-o - Pii-o)- 
Just as in Sections 2-4, monotonicity assumptions 
can be made to tighten the above bounds. For in¬ 
stance, if 


Pr[5(0)<5(l)] = l, 

Pr[y(0, s) < y(l, s)] = 1 for s = 0,1 and 
Pr[y(2;,0) < y( 2 :, 1)] = 1 for z = 0,1, 


are assumed, then Pr[L = Z] =0 for all I such that 
(i) 5(0) = 1 and 5(1) = 0, (ii) y(0, s) = 1 and 
y(l,s) = 0 for s = 0 or 1 or (hi) Y{z,0) = 1 and 
Y{z, 1) = 0 for s = 0 or 1, which restricts the feasi¬ 
ble region of the linear programming problem. The 
resulting sharp bounds for the natural direct effect 
are 

/ 0 ,Poi.o -Poi uPio-i -Pio-o, 

lllclX \ 

1 Poi-o -Poi-l +P101 -PlO-0 

(32) 

< NDE^ < pio-i -L piM - pio-o - Pii-o 


(Sjolander (2009)). The bounds (32) are always at 
least as narrow as (30) and (31). Interestingly these 
narrower bounds do not depend on The bounds in 
(32) may also collapse to a single point, for example, 

if Pio-o = Pio i and poi-o - Poi-i = Pii i - Pii-o- 
The natural direct effect provides insight into 
whether or not treatment yields additional bene¬ 
fit on the outcome of interest when the influence 
of treatment on the intermediate variable is elimi¬ 
nated. However, researchers might also be interested 


in what benefit is provided by treatment if the ef¬ 
fect of the intermediate variable on the outcome is 
eliminated or held constant. This question suggests 
a different causal estimand known as the controlled 
direct effect. Bounds for the controlled direct ef¬ 
fect can be found in Pearl (2001), Cai et al. (2008), 
Sjolander (2009) and VanderWeele (2011a). 

5.2 Sensitivity Analysis 

As in other settings where the effect of interest 
is not identifiable, sensitivity analysis in the media¬ 
tion setting may be conducted by making untestable 
assumptions that identify the direct or indirect ef¬ 
fects. Then sensitivity of inference to departures 
from these assumptions can be examined. For exam¬ 
ple, if (20) holds, then the natural direct and indirect 
effects are identihed under the following additional 
assumptions 

(33) Y{z,s)\lS\Z forz, s = 0,l and 

(34) y(z, s) U 5 ( 2 :') for z, z', s = 0,1 

(Pearl (2001); VanderWeele (2010)). Assumption 
(33) would be valid if subjects were randomly as¬ 
signed 5 within different levels of treatment assign¬ 
ment Z. In settings where 5 is not randomly as¬ 
signed, (33) might be considered plausible if it is 
believed that conditional on Z there are no variables 
which confound the mediator-outcome relationship. 
Both assumptions (33) and (34) will not hold in gen¬ 
eral if Z has an effect on some other intermediate 
variable, say R, which in turn has an effect on both 
5 and Y. Thus, (33) and (34) may fail unless the 
mediator 5 occurs shortly after treatment Z. Under 
assumptions (20), (33) and (34), 

NDE^ = (-1)^ Y^{E[Y\Z = 1-z,S = s] 

S 

- E[Y\Z = z,S = s]} Pr[5 = s\Z = z] 

and 

NIE,, = (-1)^ E[Y\Z = z,S = s] 

S 

■ {Fr[S = s\Z = 1 - z] 

-Pv[S = s\Z = z]}. 

Because assumptions (33) and (34) cannot be em¬ 
pirically tested, sensitivity analysis should be con¬ 
ducted. Similar to Section 2.4, sensitivity analysis 
might proceed by positing the existence of an un¬ 
measured confounding variable U associated with 
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the potential mediator values S{z) and the poten¬ 
tial outcomes Y{z, s) for z, s = 0,1. Assumption (33) 
would then replaced by Y{z,s)\l S\{Z,U} and (34) 
by Y{z,s) n S{z')\U for s,z,z' = 0,1. Sensitivity 
analysis would then proceed by exploring how in¬ 
ference about the natural direct and indirect effects 
changes as the magnitude of the associations of U 
with S{z) and Y{s, z') for z, z', s = 0,1 vary. For fur¬ 
ther details regarding bounds and sensitivity anal¬ 
ysis in mediation analysis, see Imai, Keele and Ya¬ 
mamoto (2010), VanderWeele (2010) and Hafeman 
( 2011 ). 

6. LONGITUDINAL TREATMENT 
6.1 Background 

In Sections 2-5, treatment is assumed to remain 
fixed across follow up time and outcomes are one¬ 
dimensional. However, frequently researchers are in¬ 
terested in assessing causal effects comparing longi¬ 
tudinal outcomes for patients on different treatment 
regimens where treatment may vary in time. As the 
number of times at which an individual may receive 
treatment increases, the number of possible treat¬ 
ment regimens increases exponentially. Because each 
treatment regimen corresponds to a separate poten¬ 
tial (longitudinal) outcome and only one potential 
outcome is ever observed, the fraction of potential 
outcomes that are unobserved quickly grows close to 
one as the number of possible treatment times in¬ 
creases. As in other settings, unless treatment reg¬ 
imens are randomly assigned, regimen effects will 
not be identifiable without additional assumptions. 
In the longitudinal setting, bounds will typically be 
largely uninformative because of the high propor¬ 
tion of unobserved potential outcomes. Therefore, 
analyses usually proceed by invoking modeling as¬ 
sumptions that render treatment effects identifiable 
and then conducting sensitivity analysis correspond¬ 
ing to key untestable modeling assumptions. 

Models for potential outcomes as functions of 
covariates (such as treatment) and possibly other 
potential outcomes are often referred to as struc¬ 
tural models. For longitudinal potential outcomes 
and treatments, popular models include struc¬ 
tural nested models and marginal structural models 
(Robins, Rotnitzky and Scharfstein (2000); Robins 
(1999); van der Laan and Robins (2003); Brumback 
et al. (2004)). In Section 6.2 below, we consider a 
marginal structural model where the treatment ef¬ 
fect is identified assuming conditionally independent 


treatment assignment. Sensitivity analyses explor¬ 
ing departures from this assumption are then con¬ 
sidered in Section 6.3. 

6.2 Marginal Structural Model 

Consider a study where individuals possibly re¬ 
ceive treatment at r fixed time points (i.e., study 
visits). In general let A{t) = (A(0 ),..., A(f)) rep¬ 
resent the history of variable A up to time t and 
A be the entire history of variable A such that 
A = A{t). Let z{t) = 1 indicate treatment at visit 
t, and z(t) = 0 otherwise such that f represents a 
treatment regimen for visits 0,..., r. Denote the ob¬ 
served treatment regimen up to time t as Z{t). Let 
Y be some outcome of interest that may be cate¬ 
gorical or continuous, and denote the potential out¬ 
come of Y at visit t for regimen z by Y{z,t) and 
the observed outcome by Y (t). Let X (t) denote the 
history of some set of time varying covariates up to 
time t, where Y(0) denotes the baseline covariates. 
Assume for simplicity there is no loss to follow-up or 
noncompliance such that we observe n i.i.d. copies 
of {Z,Y,X). 

Consider the following marginal structural model 
of the mean potential outcome were the entire pop¬ 
ulation to follow regimen .z up to time t: 

g{E{Y{z,t)\X{0)=x{0)]) 

(35) 

= Po + Pi cum[ 2 ;(t - 1)] -I- P 2 t + PsxiO) 

for t G {1 ,... ,r}, where cum[z(t — 1)] = Y1]A=i 
and g[-) is an appropriate link function. The causal 
estimand of interest is /3i, the regression coefficient 
for cum[z(t — 1)], which is the effect of having re¬ 
ceived treatment at one additional visit prior to time 
t conditional on baseline covariates Y(0). Because 

(35) involves counterfactual outcome distributions. 
Pi is not identifiable without additional assump¬ 
tions. One additional assumption is conditionally in¬ 
dependent treatment assignment 

Y{z,t) n Z{k)\{Z{k - l),X{k)} 

(36) 

for all z and t> k 

(Robins, Rotnitzky and Scharfstein (2000); Robins 
(1999); Brumback et al. (2004)). This assumption 
is true if the potential outcome at visit t under 
treatment regimen z is independent of the observed 
treatment at visit k given the history of treatment 
up to visit k — 1 and the covariate history up to 
visit k. Assuming both a correctly specified model 
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(35) and conditionally independent treatment as¬ 
signment (36), fitting the following model to the ob¬ 
served data: 

giE[Yit)\Z{t - 1) = zit - 1),X{0) = x(0)]) 

= 110 Ym cum[z(t - 1)] -L r] 2 t + 

using generalized estimating equations with an inde¬ 
pendent working correlation matrix and time vary¬ 
ing inverse probability of treatment weights (IPTW) 
yields an estimator fji that is consistent for /3i 
(Tchetgen Tchetgen et ah, 2012a, 2012b). 

6.3 Sensitivity Analysis 

If assumption (36) does not hold, then the IPTW 
estimator f)i is not necessarily consistent. Because 

(36) is not testable from the observed data, sensi¬ 
tivity analysis might be considered to assess robust¬ 
ness of inference to departures from (36). Following 
Robins (1999) and Brumback et al. (2004), let 

c(t, k, z{t — l),x{k)) 

= E[Y{z, t)\Z{k) = z{k),X{k) = x{k)] 

- E[Y{z,t)\Z{k) = 1 - z{k), 

Z{k — 1) = z{k — l),X{k) = x{k)] 

for t>k and z such that Pr[Z (k) = z{k)\Z{k — l) = 
z{k — 1)] is bounded away from 0 and 1. The func¬ 
tion c quantifies departures from the conditional in¬ 
dependent treatment assignment assumption (36) 
at each visit t > k, where c{t,k,z{t — l),x{k)) = 0 
for all ^ and t > k if (36) holds. For the identity 
link, a bias adjusted estimator of the causal effect 
Pi may be obtained by recalculating the IPTW esti¬ 
mator with the observed outcome Y (t) replaced by 
y 7 (t) = Y{t) - b{Z{t -l),X{t-l)) where 

b{Z{t-l),X{t-l)) 

t-i 

= J2c{t,k,Z{t-l),X{k)) 

k=0 

■f[l-Z{k)\Zik-l),X{k)] 

and f[z{k)\z{k — 1 ), 3 :(A:)] = Pr[Z(/c) = z{k)\Z{k — 
1) = z{k — l),X{k) = x{k)\ is an estimate of the con¬ 
ditional probability of the observed treatment based 
on some fitted parametric model (Brumback et al. 
(2004)). Provided this parametric model and c are 
both correctly specified, this bias adjusted estima¬ 
tor, say ? 7 i, is consistent for /3i. Sensitivity analysis 
proceeds by examining how f)i changes when vary¬ 
ing sensitivity parameters in c{t,k,z{t — l),x{k)). 


Because c{t,k,z{t — l),x{k)) is not identifiable 
from the observable data, Robins (1999) recom¬ 
mends choosing a particular c that is easily explain¬ 
able to subject matter experts to facilitate eliciting 
plausible ranges of the sensitivity parameters. As an 
example of a particular c, Brumback et al. (2004) 
suggest c(t, k, z{t — l),x{k)) = 'y{2z{k) — 1 } where 7 
is an unidentifiable sensitivity analysis parameter. 
Note that c{t,k,z{t — l),x{k)) = 7 for z{k) = 1 and 
c{t, k, z{t — l),x{k)) = —7 for z{k) = 0. Thus, 7 > 0 
(7 < 0 ) corresponds to subjects receiving treatment 
at time k having greater (smaller) mean potential 
outcomes at future visit t than those who did not re¬ 
ceive treatment at visit k. When 7 = 0, T(t) = Y'^(t) 
and therefore fji = 7 ) 1 . The function c might depend 
on the baseline covariates A( 0 ) or the time-varying 
covariates X(k). In this case, as in Section 2.5, care 
should be taken in clearly communicating the sen¬ 
sitivity parameters’ relationship to these covariates 
when eliciting plausible ranges from subject mat¬ 
ter experts. Another consideration when choosing a 
function c is whether it will allow for the sharp null 
of no treatment effect, that is, for all individuals 
Y{z,t) =Y{Y,t) for all z,Y, t. The example func¬ 
tion c presented above allows for the sharp null. See 
Brumback et al. (2004) for other example c func¬ 
tions and further discussion of sensitivity analysis 
for marginal structural models. 

7. IGNORANCE AND UNCERTAINTY 
REGIONS 

Treatment effect bounds describe ignorance due to 
partial identifiability but do not account for uncer¬ 
tainty due to sampling error. This section discusses 
some methods to appropriately quantify uncertainty 
due to sampling variability when drawing inference 
about partially identifiable treatment effects. Over 
the past decade, a growing body of research, es¬ 
pecially in econometrics, has considered inference 
of partially identifiable parameters. The approach 
presented below draws largely upon Vansteelandt 
et al. (2006), who considered methods for quantify¬ 
ing uncertainty in the general setting where miss¬ 
ing data causes partial identifiability. As questions 
about treatment (or causal) effects can be viewed 
as missing data problems, the approach of Vanstee¬ 
landt et al. generally applies (under certain assump¬ 
tions) to the type of problems considered through¬ 
out this paper. This approach builds on earlier work 
by Robins (1997) and others. 
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7.1 Ignorance Regions 

Let L be a vector containing the potential out¬ 
comes for an individual, let O denote the ob¬ 
served data vector, and let i? be a vector con¬ 
taining indicator variables denoting whether the 
corresponding component of L is observed. For 
example, L = (Y{1),Y{0)), O = {Z,Y), and R = 
{Z, (1 — Z)) for the scenario described in Section 2 
and L = (y(l),y(0),5(l),5(0)), 0 = {Z,Y,S) and 
R = {Z,{1 — Z),Z,{1 — Z)) for the scenario de¬ 
scribed in Section 3. Denote the distribution of 
(L, R) by f{L, R) and let /(L) = f f{L, R) dR. The 
goal is to draw inference about a parameter vector 
fd which is a functional of the distribution of poten¬ 
tial outcomes L; this is sometimes made explicit by 
writing /3 = /3{/(L)}. Denote the true distribution 
of {L,R) by fo{L,R) and the true value of j3 by 
/So = For example, (do = £^[^(1) - ^(0)] 

for the scenario described in Section 2 and /3o = 
— y(0)|5'^° = (1,1)] for the scenario de¬ 
scribed in Section 3. Denote the true observed data 
distribution by fo{0) = f fo{L, R) where 

F(l-r) denotes the missing part of L when R = r 
(i.e., the unobserved potential outcomes). The chal¬ 
lenge in drawing inference about (do is that there 
may be multiple full data distributions f{L,R) that 
marginalize to the true observed data distribution, 
that is, fo{0) = j f{L,R)dL(i_R) for some / / / q . 
When this occurs, (d may be only partially identifi¬ 
able from O, in which case bounds can be derived 
for (do as illustrated in the sections above. 

The set of values of /?{/(£)} such that f{L,R) 
marginalizes to the true observed data distribu¬ 
tion is sometimes called the ignorance region or 
the identified set. These ignorance regions or in¬ 
tervals are distinct from traditional confidence in¬ 
tervals in that as the sample size tends to in¬ 
finity these intervals will not shrink to a single 
point when (d is partially identifiable. The igno¬ 
rance region for (d can be defined formally as fol¬ 
lows. Following Robins (1997), define a class 
of full data laws indexed by some sensitivity pa¬ 
rameter vector 7 to be nonparametrically identi¬ 
fied if for each observed data law f{0) there exists 
a unique law f{L,R]j) G Mi'f) such that f{0) = 
f f{L,R-,^)dL(^i_jiy In other words, the class Mi'y) 
contains a unique distribution that marginalizes to 
each possible observed data distribution. For ex¬ 
ample, for the sensitivity analysis approach in Sec¬ 
tion 3.4, Hudgens and Halloran (2006, §4.3.3) de- 
hned a class of full data laws indexed by 7 given 


in (19) that is nonparametrically identihed. The ig¬ 
norance region for (d is formally defined to be 

ir/o(/3>F) 

= |/3{/(L)} :/(L) 

(37) = J /(L, R; 7 ) dR for some 

f{L,R) G A4(F) such that 

I /(R,R;7)dL(i-fl) = /o(0)}, 

where F is the set of all possible values of 7 un¬ 
der whatever set of assumptions is being invoked 
and A4(F) = U^gp A4(7). Assume A4(F) contains 
the true full data distribution, that is, /o(T,R) = 
/(R,R, 7 o) for some 70 G F. [For considerations 
when A4(r) does not contain the true full data dis¬ 
tribution, see Todem, Fine and Peng (2010).] Be¬ 
cause is nonparametrically identified, for each 

7 G F there is a single / 3 ( 7 ) = /3{f f{L, R; 7 ) dR)} in 
the ignorance region (37). If A4(F) includes all pos¬ 
sible full data distributions that marginalize to any 
possible observed data distribution, then the igno¬ 
rance region will contain the bounds. 

In practice, the ignorance region will be unknown 
because it depends on the unknown true observed 
data distribution fo{0). For 7 hxed, / 3 ( 7 ) is iden¬ 
tifiable from the observed data and the ignorance 
region can be estimated by estimating (d{'y) for each 
value of 7 G F, denoted by (d{'y). The resulting esti¬ 
mator of irjp(/3,r) is then {fd{'y) :'y G Fj. For scalar 
(di'j), let A = inf^er{/3(7)} and jdu = sup.^gr{/ 3 ( 7 )} 
such that the estimated ignorance region is con¬ 
tained in the interval \jdi,(du]- 

7.2 Uncertainty Regions 

Estimated ignorance regions convey ignorance due 
to partial identihability and do not reflect sampling 
variability in the estimates. Indeed much of the lit¬ 
erature on bounds and sensitivity analysis of treat¬ 
ment effects tends to report estimated ignorance re¬ 
gions and either ignores sampling variability or em¬ 
ploys ad-hoc inferential approaches such as point- 
wise confidence intervals conditional on each value 
of the unidentifiable sensitivity parameter. More re¬ 
cent developments have provided a formal frame¬ 
work for conducting inference in partial identifiabil- 
ity settings (Imbens and Manski (2004); Vanstee- 
landt et al. (2006); Romano and Shaikh (2008); 


NONPARAMETRIC BOUNDS AND SENSITIVITY ANALYSIS 


17 


Bugni (2010); Todem, Fine and Peng (2010)). The 
main focus in this research has been the construc¬ 
tion of confidence regions for either the parameter 
/3o or the ignorance region irjp(/3o,r). 

Following Vansteelandt et al. (2006), a (1 — a) 
pointwise uncertainty region for /3o is defined to be 
a region URp(/3,F) such that 

inf Pr/J/3(7) G URp(/3,F)} > 1 - a, 

7€r 

where Prjp{-} denotes probability under fo{0). 
That is, URp(/3,F) contains (3{'j) with at least prob¬ 
ability 1 — a for all 7 G F. In particular, assuming 
7 o G F, then URp(/3,F) will contain /3o = /3(jo) with 
at least probability 1 — a. 

An appealing aspect of pointwise uncertainty re¬ 
gions is that they retain the usual duality between 
conhdence intervals and hypothesis testing. Namely, 
one can test the null hypothesis Hq: Po = (3c versus 
Ha : /3o 7^ /3c for some specific (3c at the a significance 
level by rejecting Hq when the (1 — a) pointwise un¬ 
certainty region URp(/3, F) excludes (3c- This is easily 
shown by noting for (3c = /3(7o) 

Pr/o [reject Hq] 

= l-Pr;J/3(7o)GURp(/3,F)} 

< 1 - in76rPr/o{/3(7) e URp(/3,F)} < a, 

where the last inequality follows because URp(/3,F) 
is a (1 — a) pointwise uncertainty region. 

Various methods under different assumptions have 
been proposed for constructing pointwise uncer¬ 
tainty regions. Imbens and Manski (2004) and 
Vansteelandt et al. (2006) proposed a simple method 
for constructing pointwise uncertainty regions for a 
scalar (3 with ignorance region [(3i,(3u] - Let 7 z, 7 n G T 
be the values of the sensitivity parameter such that 
A = /3 (t) and (3u = (3{-fu)- Assume 


(38) 


(39) 


There exist A such that 
Vn{(3i- (3 i)A N{Q,af) 
and (3u such that 
Vn0u- N{0,al). 

The values 7 ; and 7 ^ are the same 
for all possible observed data laws. 


Under assumptions (38) and (39), an asymptotic 
(1 — a) pointwise uncertainty interval for (3o is 

URp(/3,F) 


= [$i - CaOi/y/n, j3u + Cadu/ , 


where Ca satisfies 


(41) $ c„ + 


Vn{^u- Pi) \ 

max{A,T«}y 


4>(-Ca) = 1 - a. 


‘h(-) denotes the cumulative distribution function of 
a standard normal variate, and A and au are con¬ 
sistent estimators of ai and cj„, respectively (Im¬ 
bens and Manski (2004); Vansteelandt et al. (2006)). 
Note if $u — (3i > 0 and n is large such that the 
left-hand side of (41) is approximately equal to 
1 — 4>(—Ca), then Cq, « zi-a, the (1 — a) quantile of a 
standard normal distribution. In contrast, if (3u = (3i, 
then Ca = Zi_al 2 - 

In addition to the pointwise uncertainty region, 
Horowitz and Manski (2000) and Vansteelandt et al. 
(2006) dehne a (1 — a) strong uncertainty region for 
(3q to be a region UR<j(/l,F) such that 


PrjJiry7/3,F) C URAAL)} > 1 - 

that is, URs(/3,F) contains the entire ignorance re¬ 
gion with probability at least 1 — a. Whereas the 
pointwise uncertainty region can be viewed as a 
confidence region for the partially identifiable tar¬ 
get parameter (3q, the strong uncertainty region is a 
confidence region for the ignorance region irj-(j(/l,F). 
Clearly, any strong uncertainty region will also be 
a (conservative) pointwise uncertainty region as 
(3q G irj-(,(/3,F). Under assumptions (38) and (39), 
an asymptotic (1 — a) strong uncertainty interval 
for scalar (3q is simply 


URA/3,r) 

(42) 

= [A - Zi_a/2^l/Vn,Pu + Zi_a/2^u/V^- 

Note that (42) is equivalent to the union of all 
pointwise (1 — a) confidence intervals for ///(y) un¬ 
der A 4 ( 7 ) over all 7 G F, which is a simple approach 
often employed when reporting sensitivity analysis. 
Because strong uncertainty intervals are necessarily 
pointwise intervals, this simple approach is also a 
valid method for computing pointwise intervals, al¬ 
though intervals based on (40) will always be as or 
more narrow. 

The two key assumptions (38) and (39) may not 
hold in general. For example, (38) may not hold 
for all possible observed data distributions, partic¬ 
ularly for extreme values of 7 ; or 7 ^. Assumption 
(39) may not hold if different observed data dis¬ 
tributions place different constraints on the possi¬ 
ble range of 7 or if F is chosen by the data ana¬ 
lyst on the basis of the observed data. If (38) or 


(40) 
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(39) does not hold, alternative inferential methods 
are needed (e.g., see Vansteelandt and Goetghe- 
beur (2001); Horowitz and Manski (2006); Cher- 
nozhukov, Hong and Tamer (2007); Romano and 
Shaikh (2008); Stoye (2009); Todem, Fine and Peng 
(2010); Bugni (2010)). 

A third approach to qnantifying uncertainty due 
to sampling variability is to consider /3(-) as function 
of 7 and construct a (1 — a) simultaneous confidence 
band for the function /?(•). That is, a random func¬ 
tion CB(-) is found such that 

PrjQ{/ 3 ( 7 ) G CB( 7 ) for all 7 € P} > 1 — a. 

It follows immediately that U 7 Gr^®( 7 ) ^ strong 

uncertainty region (and thus a pointwise uncertainty 
region as well). Todem, Fine and Peng (2010) sug¬ 
gest a bootstrap approach to constructing confi¬ 
dence bands. 

Whether pointwise uncertainty regions, strong un¬ 
certainty regions, or confidence bands are preferred 
will be context specific. Typically, it is of inter¬ 
est to draw inference about a single target pa¬ 
rameter and not the entire ignorance region. Thus, 
in general pointwise uncertainty regions may have 
greater utility than strong uncertainty regions. Be¬ 
cause strong uncertainty regions are necessarily con¬ 
servative pointwise uncertainty regions, the strong 
regions can be useful in settings where determining 
a pointwise region is more difficult. Additionally, in 
some settings it may be of interest to assess whether 
/3 is nonzero, for example, if /3 denotes the effect of 
treatment. In these settings, computing a confidence 
band CB(-) has the advantage of providing the sub¬ 
set of F where the null hypothesis /I( 7 ) = 0 can be 
rejected. This is especially appealing if 7 is scalar, 
in which case a confidence band (as in Figure 3 of 
Todem, Fine and Peng (2010)) provides a simple ap¬ 
proach to reporting sensitivity analysis results. On 
the other hand, if 7 is multidimensional, visualizing 
conhdence bands can be difficult and instead report¬ 
ing the (pointwise or strong) uncertainty region may 
be more practical. 

7.3 Data Example 

Returning to the pertussis vaccine study described 
in Section 3, an analysis that ignores the potential 
for selection bias might entail computing a naive 
estimator (the difference in empirical means of Y 
between the vaccinated and unvaccinated amongst 
those infected) along with a 95% Wald confidence 
interval, which would be —0.31 (95% Cl —0.38, 



Fig. 1. Graphical depiction of the bounds and sensitivity 
analysis model described in Sections 3.3 and 3.4- The solid 
thin line with negative slope represents a set of joint distribu¬ 
tion functions of (Z,S(l),S'(O),P(l),P(0)) that all give rise 
to the same distribution of the observable random variables 
{Z,S,Y). The four dotted curves depict the log odds ratio se¬ 
lection model for'y — 0,1,2,4. The 7 = 0 model is equivalent to 
the no selection model. Each selection model identifies exactly 
one pair of expectations from this set, rendering the principal 
effect (10) identifiable. The thick black lines on the edge of 
the unit square correspond to the lower bound of the principal 
effect. 

—0.23). If the sensitivity analysis approach in Sec¬ 
tion 3.4 is applied, the parameter of interest (3{'y) = 
E[Y (1) — y(0)|S'^o = (1,1)] is identified for fixed val¬ 
ues of the sensitivity analysis parameter 7 given in 
(19). For fixed 7 , E[Y (0)15’'^“ = (1,1)] is determined 
by the intersection of the negative sloped line (14) 
and the curve (19), which is illustrated in Figure 1 
for the pertussis data. Because E[Y{0)\S^° = (1,1)] 
increases with 7 , / 3 ( 7 ) is a monotonically decreas¬ 
ing function of 7 . Therefore 7 / and 7 ^ equal the 
maximum and minimum values of F regardless of 
the observed data law, indicating (39) holds pro¬ 
vided that F is chosen by the analyst independent 
of the observed data. For 7 fixed and finite, / 3 ( 7 ) 
can be estimated via nonparametric maximum like¬ 
lihood (i.e., without any additional assumptions). 
This estimator will be consistent and asymptoti¬ 
cally normal under standard regularity conditions 
if Pr[S'(0) > £'(1)] > 0 (i.e., the vaccine has a pro¬ 
tective effect against infection). For 7 = ±00 and 
Pr[S'(0) > £(1)] > 0, Lee (2009) proved that the es¬ 
timators of the bounds similar to those given in Sec¬ 
tion 3.3 are consistent and asymptotically normal 
for a continuous outcome Y. The limiting distribu¬ 
tion of the estimator of the upper bound (7 = — 00 ) 
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Table 1 

Pertussis vaccine study data: Estimated ignorance regions 
and 95% pointwise and strong uncertainty regions of 
P = E[Y{1) - Y(0)|S^« = (1,1)] for different T 


r 

ir/o(^,r) 

URp(/3,r) 

URRRE) 

[-3,3] 

[-0.49,-0.17] 

[-0.58,-0.07] 

[-0.59,-0.06] 

[-5,5] 

[-0.55,-0.15] 

[-0.66,-0.05] 

[-0.69,-0.03] 

[-10,10] 

[-0.57,-0.15] 

[-0.70,-0.04] 

[-0.73,-0.02] 

(— 00 , 00 ) 

[-0.57,-0.15] 

[-0.70,-0.04] 

[-0.73,-0.02] 


for a binary outcome will be normal if in addition 



and similarly the estimator of the lower bound (7 = 
00 ) will be asymptotically normal if in addition 


(44) 


E[Y\S = 1,Z = 0] / 


Pr[S = l|Z = l] 
Pr[S = l|Z = 0]’ 


Likelihood ratio tests for the null hypotheses that 
(43) and (44) do not hold yield p-values p < 10“^ 
and p = 0.18, respectively, indicating strong evi¬ 
dence that (43) holds and equivocal evidence regard¬ 
ing (44). Assuming (43) and (44) both hold implies 
(38), such that (40) and (42) can be used to con¬ 
struct (1 — a) pointwise and strong uncertainty in¬ 
tervals for Pq. Estimated ignorance and uncertainty 
intervals of Po for different choices of P are given in 
Table 1 and Figure 2, with standard error estimates 
obtained using the observed information. Even for 
P = (— 00 , 00 ) both the pointwise and strong uncer¬ 
tainty intervals exclude zero, indicating a significant 
effect of vaccination. In particular, with 95% confi¬ 
dence we can conclude the vaccine decreased the risk 
of severe disease among individuals who would have 
become infected regardless of vaccination. 


8. DISCUSSION 

This paper considers conducting inference about 
the effect of a treatment (or exposure) on an out¬ 
come of interest. Unless treatment is randomly as¬ 
signed and there is perfect compliance, the effect of 
treatment may be only partially identifiable from 
the observable data. Through the hve settings in 
Sections 2-6, we discussed two approaches often em¬ 
ployed to address partial identifiability: (i) bound¬ 
ing the treatment effect under minimal assump¬ 
tions, or (ii) invoking additional untestable assump¬ 
tions that render the treatment effect identifiable 


Fig. 2. Estimated ignorance regions ir/gf/I, E) and 95% 
pointwise uncertainty regions URp(/3,r) for the pertussis vac¬ 
cine example in Section 7.3. The principal effect (10) is de¬ 
noted P and r = [— 7 u, 7 „] for 7 u along the horizontal axis. 
The curve given by the lower boundary of the area with black 
slanted lines corresponds to Pi, the minimum of the estimated 
ignorance regions, and the upper bound of the area with black 
slanted lines corresponds to Pu, the maximum of the estimated 
ignorance region. The curve given by the lower (upper) bound¬ 
ary of the gray shaded area corresponds to the minimum (max¬ 
imum) of the 95% pointwise uncertainty region. 


and then conducting sensitivity analysis to assess 
how inference about the treatment effect changes 
as the untestable assumptions are varied. Incor¬ 
porating uncertainty due to sampling variability 
was discussed in Section 7, and throughout large- 
sample frequentist methods were considered. Anal¬ 
ogous Bayesian approaches to partial identification 
(Gustafson (2010); Moon and Schorfheide (2012); 
Richardson, Evans and Robins (2011)) and sensi¬ 
tivity analysis (McCandless, Gustafson and Levy 
(2007); Gustafson et al. (2010)) have also been de¬ 
veloped. 

Determining treatment effect bounds is essentially 
a constrained optimization problem, where the con¬ 
straints are determined by the relationship between 
the distributions of the observable random variables 
and of the potential outcomes under whichever as¬ 
sumptions are being made. In simple cases, such 
as in Section 2.1, bounds can easily be derived 
from hrst principles and may have simple closed 
forms; in more complicated settings, such as in Sec¬ 
tion 4, bounds may be determined using linear pro¬ 
gramming or other optimization methods. In many 
cases, calculating bounds under minimal assump¬ 
tions may seem to be a meaningless exercise because 
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the bounds are often quite wide and may not ex¬ 
clude the null of no treatment effect as seen with 
the “no assumptions” bounds in Section 2. On the 
contrary, in settings like this Robins and Greenland 
(1996) write: “Some argue against reporting bounds 
for nonidentifiable parameters, because bounds are 
often so wide as to be useless for making public 
health decisions. But we view the latter problem as 
a reason for reporting bounds in conjunction with 
other analyses: Wide bounds make clear that the de¬ 
gree to which public health decisions are dependent 
on merging the data with strong prior beliefs.” 

Bounds may be narrowed by reducing the feasible 
region of the optimization problem. This may be ac¬ 
complished by considering further assumptions that 
place restrictions on either the distributions of the 
potential outcomes, the distributions of the observ¬ 
able random variables, or both. Assumptions that 
place restrictions on the observable random vari¬ 
ables may have implications which are testable. If 
the observed data provide evidence against any as¬ 
sumptions being considered, bounds should be com¬ 
puted without making these assumptions. Those as¬ 
sumptions without testable implications can only be 
determined to be plausible or not by subject matter 
experts. 

A potentially less conservative approach to com¬ 
puting bounds is to make untestable assumptions 
which identify the causal estimand and then assess 
the robustness of inference drawn to departures from 
these assumptions in a sensitivity analysis. A general 
guideline for specifying the sensitivity analysis pa¬ 
rameters representing these departures is to choose 
parameters that are easily interpretable to subject 
matter experts. Parameter specification will depend 
on whether or not sensitivity analysis is conducted 
by directly modeling the association of an unmea¬ 
sured confounder U with treatment selection and 
the potential outcomes. Sensitivity analyses based 
on this approach are applicable when the existence 
of U is known and there is some historical knowledge 
of the magnitude association of U with Z and the 
potential outcomes (Robins (1999); Brumback et al. 
(2004)). Otherwise, alternative approaches based on 
directly modeling the unobserved potential outcome 
distributions may be preferred. A second guiding 
principle should be to avoid specifications of sen¬ 
sitivity parameters that place restrictions on the 
distributions of observable random variables that 
are not empirically supported. A third considera¬ 
tion when conducting sensitivity analysis concerns 


determining a plausible region of the sensitivity pa¬ 
rameters. That the region be chosen prior to data 
analysis is in general necessary for inference, such 
as described in Section 7, to be valid. Choice of 
the region of the sensitivity parameters may be dic¬ 
tated by whether one wants to consider only mild or 
also severe departures from the identifying assump¬ 
tions. If the identifying assumption in question is 
considered plausible, then it may be that only mild 
departures from the assumption are deemed neces¬ 
sary for the sensitivity analysis. In this case, sub¬ 
ject matter experts can be consulted to determine, 
prior to data analysis, a plausible region for the sen¬ 
sitivity parameters. If, on the other hand, severe 
departures from untestable identifying assumptions 
are to be entertained, sensitivity analyses should be 
conducted over all possible values of the sensitivity 
parameters. Sensitivity analyses which consider all 
possible full data distributions that marginalize to 
the observed data distribution will yield ignorance 
regions containing the bounds. 

Though the examples presented here demonstrate 
the broad scope of scenarios where bounds and sen¬ 
sitivity analysis methods have been derived and 
employed to draw inference about treatment ef¬ 
fects, they certainly are not exhaustive of all set¬ 
tings where these methods have been developed. 
For instance, VanderWeele, Mukherjee and Chen 
(2012) consider sensitivity analysis to unmeasured 
confounding for causal interaction effects. Bounds 
and sensitivity analysis methods have also recently 
been considered in the presence of interference, that 
is, in settings where treatment of one individual may 
affect the outcome of another individual, such as 
in social networks (Ver Steeg and Galstyan (2010); 
VanderWeele (2011b); Manski (2013)). For stud¬ 
ies where sensitivity analyses are planned or antic¬ 
ipated, Rosenbaum and colleagues have examined 
how aspects of study design and the choice of sta¬ 
tistical tests or estimators may affect the power or 
precision of the sensitivity analyses to be conducted 
(Heller, Rosenbaum and Small (2009); Rosenbaum, 
2010 a; 2010b; 2011). 

Bounds and sensitivity analyses of treatment ef¬ 
fects have been utilized in various substantive set¬ 
tings, such as biomedical research (e.g.. Cole et al. 
(2005); Rerks-Ngarm et al. (2009); VanderWeele and 
Hernandez-Diaz (2011); Hu et al. (2012)) and eco¬ 
nomics (e.g., Heckman (2001); Sianesi (2004); Arm¬ 
strong, Guay and Weber (2010)). Nonetheless, de¬ 
spite the wide range of settings in which these meth- 
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ods are applicable, their use in substantive set¬ 
tings remains somewhat limited in frequency. Given 
the large amount of literature detailing their broad 
scope of applicability and that formal inferential 
methods for partially identifiable parameters are 
now available, hopefully these approaches will be 
employed with greater frequency in substantive set¬ 
tings in the future. 

The sensitivity analyses described throughout this 
paper focus on departures from untestable assump¬ 
tions which identify treatment effects. Other types 
of sensitivity analyses might be considered as well, 
for example, to assess how robust inferences are to 
various analytical decisions that are invariably made 
in data analysis. Rosenbaum (2002, Section 11.9) 
refers to such assessment as “stability analysis,” in 
contrast to the types of sensitivity analyses dis¬ 
cussed above. See Rosenbaum (1999, 2002) and 
Morgan and Winship (2007, Section 6.2) for fur¬ 
ther discussion regarding various types of sensitivity 
analyses beyond the type considered here. 
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